Program Representation for General Intelligence
نویسندگان
چکیده
Traditional machine learning systems work with relatively flat, uniform data representations, such as feature vectors, time-series, and context-free grammars. However, reality often presents us with data which are best understood in terms of relations, types, hierarchies, and complex functional forms. One possible representational scheme for coping with this sort of complexity is computer programs. This immediately raises the question of how programs are to be best represented. We propose an answer in the context of ongoing work towards artificial general intelligence. Background and Motivation What are programs? The essence of programmatic representations is that they are well-specified, compact, combinatorial, and hierarchical. Well-specified: unlike sentences in natural language, programs are unambiguous; two distinct programs can be precisely equivalent. Compact: programs allow us to compress data on the basis of their regularities. Accordingly, for the purposes of this paper, we do not consider overly constrained representations such as the well-known conjunctive and disjunctive normal forms for Boolean formulae to be programmatic. Although they can express any Boolean function (data), they dramatically limit the range of data that can be expressed compactly, compared to unrestricted Boolean formulae. Combinatorial: programs access the results of running other programs (e.g. via function application), as well as delete, duplicate, and rearrange these results (e.g., via variables or combinators). Hierarchical: programs have intrinsic hierarchical organization, and may be decomposed into subprograms. Baum has advanced a theory “under which one understands a problem when one has mental programs that can solve it and many naturally occurring variations” (Bau06). Accordingly, one of the primary goals of artificial general intelligence is systems that can represent, learn, and reason about such programs (Bau06; Bau04). Furthermore, integrative AGI systems such as Novamente (LGP04) may contain subsystems operating on programmatic representations. Would-be AGI systems with no direct support for programmatic representation will clearly need to represent procedures and procedural abstractions somehow. Alternatives such as recurrent neural networks have serious downsides, however, including opacity and inefficiency. Note that the problem of how to represent programs for an AGI system dissolves in the limiting case of unbounded computational resources. The solution is algorithmic probability theory (Sol64), extended recently to the case of sequential decision theory (Hut05). The latter work defines the universal algorithmic agent AIXI, which in effect simulates all possible programs that are in agreement with the agent’s set of observations. While AIXI is uncomputable, the related agent AIXItl may be computed, and is superior to any other agent bounded by time t and space l (Hut05). The choice of a representational language for programs1 is of no consequence, as it will merely introduce a bias that will disappear within a constant number of time steps.2 The contribution of this paper is providing practical techniques for approximating the ideal provided by algorithmic probability, based on what Pei Wang has termed the assumption of insufficient knowledge and resources (Wan06). Given this assumption, how programs are represented is of paramount importance, as is substantiated the next two sections, where we give a conceptual formulation of what we mean by tractable program representations, and introduce tools for formalizing tractability. The fourth section of the paper proposes an approach for tractably representing programs. The fifth and final section concludes and suggests future work. Representational Challenges Despite the advantages outlined in the previous section, there are a number of challenges in working with programmatic representations: • Open-endedness – in contrast to other knowledge representations current in machine learning, programs vary in size and “shape”, and there is no obvious problemindependent upper bound on program size. This makes it difficult to represent programs as points in a fixeddimensional space, or to learn programs with algorithms that assume such a space. • Over-representation – often, syntactically distinct programs will be semantically identical (i.e. represent the same underlying behavior or functional mapping). As well as a language for proofs in the case of AIXItl. The universal distribution converges quickly (Sol64). Lacking prior knowledge, many algorithms will inefficiently sample semantically identical programs repeatedly (GBK04; Loo07b). • Chaotic Execution – programs that are very similar, syntactically, may be very different, semantically. This presents difficulties for many heuristic search algorithms, which require syntactic and semantic distance to be correlated (TVCC05; Loo07c). • High resource-variance – programs in the same space vary greatly in the space and time they require to execute. Based on these concerns, it is no surprise that search over program spaces quickly succumbs to combinatorial explosion, and that heuristic search methods are sometimes no better than random sampling (LP02). Regarding the difficulties caused by over-representation and high resourcevariance, one may of course object that determinations of e.g. programmatic equivalence for the former, and e.g. halting behavior for the latter, are uncomputable. Given the assumption of insufficient knowledge and resources, however, these concerns dissolve into the larger issue of computational intractability and the need for efficient heuristics. Determining the equivalence of two Boolean formulae over 500 variables by computing and comparing their truth tables is trivial from a computability standpoint, but, in the words of Leonid Levin, “only math nerds would call 2 finite” (Lev94). Similarly, a program that never terminates is a special case of a program that runs too slowly to be of interest to us. In advocating that these challenges be addressed through “better representations”, we do not mean merely trading one Turing-complete programming language for another; in the end it will all come to the same. Rather, we claim that to tractably learn and reason about programs requires us to have prior knowledge of programming language semantics. The mechanism whereby programs are executed is known a priori, and remains constant across many problems. We have proposed, by means of exploiting this knowledge, that programs be represented in normal forms that preserve their hierarchical structure, and heuristically simplified based on reduction rules. Accordingly, one formally equivalent programming language may be preferred over another by virtue of making these reductions and transformations more explicit and concise to describe and to implement. What Makes a Representation Tractable? Creating a comprehensive formalization of the notion of a tractable program representation would constitute a significant achievement; and we will not fulfill that summons here. We will, however, take a step in that direction by enunciating a set of positive principles for tractable program representations, corresponding closely to the list of representational challenges above. While the discussion in this section is essentially conceptual rather than formal, we will use a bit of notation to ensure clarity of expression; S to denote a space of programmatic functions of the same type (e.g. all pure Lisp λ-expressions mapping from lists to numbers), and B to denote a metric space of behaviors. In the case of a deterministic, side-effect-free program, execution maps from programs in S to points in B, which will have separate dimensions for function outputs across various inputs of interest, as well as dimensions corresponding to the time and space costs of executing the program. In the case of a program that interacts with an external environment, or is intrinsically nondeterministic, execution will map from S to probability distributions over points in B, which will contain additional dimensions for any sideeffects of interest that programs in S might have. Note the distinction between syntactic distance, measured as e.g. tree-edit distance between programs in S, and semantic distance, measured between programs’ corresponding points in or probability distributions over B. We assume that semantic distance accurately quantifies our preferences in terms of a weighting on the dimensions of B; i.e., if variation along some axis is of great interest, our metric for semantic distance should reflect this. Let P be a probability distribution over B that describes our knowledge of what sorts of problems we expect to encounter, and let R(n) ⊆ S be the set of all of the programs in our representation with (syntactic) size no greater than n. We will say that “R(n) d-covers the pair (B,P) to extent p” if p is the probability that, for a random behavior b ∈ B chosen according to P , there is some program in R whose behavior is within semantic distance d of b. Then, some among the various properties of tractability that seem important based on the above discussion are as follows: • for fixed d, p quickly goes to 1 as n increases, • for fixed p, d quickly goes to 0 as n increases, • for fixed d and p, the minimal n needed for R(n) to dcover (B,P) to extent p should be as small as possible, • ceteris paribus, syntactic and semantic distance (measured according to P) are highly correlated. Since execution time and memory usage measures may be incorporated into the definition of program behavior, minimizing chaotic execution and managing resource variance emerges conceptually here as a subcase of maximizing correlation between syntactic and semantic distance. Minimizing over-representation follows from the desire for small n: roughly speaking the less over-representation there is, the smaller average program size can be achieved. In some cases one can empirically demonstrate the tractability of representations without any special assumptions about P: for example in prior work we have shown that adoption of an appropriate hierarchical normal form can generically increase correlation between syntactic and semantic distance in the space of Boolean functions (Loo06; Loo07c). In this case we may say that we have a generically tractable representation. However, to achieve tractable representation of more complex programs, some fairly strong assumptions about P will be necessary. This should not be philosophically disturbing, since it’s clear that human intelligence has evolved in a manner strongly conditioned by certain classes of environments; and similarly, what we need to do to create a viable program representation system for pragmatic AGI usage is to achieve tractability relative to the distribution P corresponding to the actual problems the AGI is going to need to solve. Formalizing the distributions P of real-world interest is a difficult problem, and one we will not address here. However, we hypothesize that the representations presented in the following section may be tractable to a significant extent irrespective3 of P , and even more powerfully tractable with respect to this as-yet unformalized distribution. As weak evidence in favor of this hypothesis, we note that many of the representations presented have proved useful so far in various narrow problem-solving situations. (Postulated) Tractable Representations We use a simple type system to distinguish between the various normal forms introduced below. This is necessary to convey the minimal information needed to correctly apply the basic functions in our canonical forms. Various systems and applications may of course augment these with additional type information, up to and including the satisfaction of arbitrary predicates (e.g. a type for prime numbers). This can be overlaid on top of our minimalist system to convey additional bias in selecting which transformations to apply, and introducing constraints as necessary. For instance, a call to a function expecting a prime number, called with a potentially composite argument, may be wrapped in a conditional testing the argument’s primality. A similar technique is used in the normal form for functions to deal with list arguments that may be empty.
منابع مشابه
Metacomputations and Program-Based Knowledge Representation
Computer programs are a very attractive way to represent knowledge about the world. A program is more than just objects and relations. It naturally provides information about evolution of a system in time. Therefore, programs can be considered the most universal data structures. The main problem with such representation is that it is much more difficult to deal with programs than with usual dat...
متن کاملThe Representation of Iran’s Nuclear Program in British Newspaper Editorials: A Critical Discourse Analytic Perspective
In this study, Van Dijk’s (1998) model of CDA was utilized in order to examine the representation of Iran’s nuclear program in editorials published by British news casting companies. The analysis of the editorials was carried out at two levels of headlines and full text stories with regard to the linguistic features of lexical choices, nominalization, passivization, overcompleteness, and voice....
متن کاملHorizontal representation of a hesitant fuzzy set and its application to multiple attribute decision making
The main aim of this paper is to present a novel method for ranking hesitant fuzzy sets (HFSs) based on transforming HFSs into fuzzy sets (FSs). The idea behind the method is an interesting HFS decomposition which is referred here to as the horizontal representation in the current study. To show the validity of the proposed ranking method, we apply it to solve a multi-attribute decision-making ...
متن کاملA Representation Theorem for Decisions about Causal Models
Given the likely large impact of artificial general intelligence, a formal theory of intelligence is desirable. To further this research program, we present a representation theorem governing the integration of causal models with decision theory. This theorem puts formal bounds on the applicability of the submodel hypothesis, a normative theory of decision counterfactuals that has previously be...
متن کاملComputational Intelligence in Computer Aided Process Planning – a Review
This paper first provides a general introduction to Computer Aided Process Planning (CAPP). In second section, a brief review of Computational Intelligence (CI) applications in machining process planning and related methods and problems will be presented. The overall applications can be classified as knowledge representation, features extraction, part classification for group technology, machin...
متن کاملThe 2005 International Florida Artificial Intelligence Research Society Conference: A Report
tional author. The conference continues its cooperative status with the American Association for Artificial Intelligence. The last few years have seen a significant increase in the number and quality of submissions to the FLAIRS conference, with submission numbers more than doubling over the last three years. This year’s conference received 249 submissions, of which 131 were accepted as full pa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008